Search CORE

189 research outputs found

yrGATE: a web-based gene-structure annotation tool for the identification and dissemination of eukaryotic genes

Author: Brendel Volker
Schlueter Shannon D
Wilkerson Matthew D
Publication venue: BioMed Central
Publication date: 19/07/2006
Field of study

Your Gene structure Annotation Tool for Eukaryotes (yrGATE) provides an Annotation Tool and Community Utilities for worldwide web-based community genome and gene annotation. Annotators can evaluate gene structure evidence derived from multiple sources to create gene structure annotations. Administrators regulate the acceptance of annotations into published gene sets. yrGATE is designed to facilitate rapid and accurate annotation of emerging genomes as well as to confirm, refine, or correct currently published annotations. yrGATE is highly portable and supports different standard input and output formats. The yrGATE software and usage cases are available at

Springer - Publisher Connector

PubMed Central

xGDB: open-source computational infrastructure for the integrated evaluation and analysis of genome features

Author: Brendel Volker
Dong Qunfeng
Schlueter Shannon D
Wilkerson Matthew D
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

The eXtensible Genome Data Broker (xGDB) provides a software infrastructure consisting of integrated tools for the storage, display, and analysis of genome features in their genomic context. Common features include gene structure annotations, spliced alignments, mapping of repetitive sequence, and microarray probes, but the software supports inclusion of any property that can be associated with a genomic location. The xGDB distribution and user support utilities are available online at the xGDB project website, http://xgdb.sourceforge.net/

Springer - Publisher Connector

PubMed Central

UNT Digital Library

ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking

Author: D. Neil Hayes
Garber
Hayes
Hoffmann
Matthew D. Wilkerson
Monti
Reich
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Summary: Unsupervised class discovery is a highly useful technique in cancer research, where intrinsic groups sharing biological characteristics may exist but are unknown. The consensus clustering (CC) method provides quantitative and visual stability evidence for estimating the number of unsupervised classes in a dataset. ConsensusClusterPlus implements the CC method in R and extends it with new functionality and visualizations including item tracking, item-consensus and cluster-consensus plots. These new features provide users with detailed information that enable more specific decisions in unsupervised class discovery

Crossref

PubMed Central

Carolina Digital Repository

Cafeteria diet-induced obesity causes oxidative damage in white adipose

Author: Hayes D. Neil
Johnson Amy R.
Makowski Liza
Sampey Brante P.
Troester Melissa A.
Wilkerson Matthew D.
Publication venue
Publication date: 01/01/2016
Field of study

Obesity continues to be one of the most prominent public health dilemmas in the world. The complex interaction among the varied causes of obesity makes it a particularly challenging problem to address. While typical high-fat purified diets successfully induce weight gain in rodents, we have described a more robust model of diet-induced obesity based on feeding rats a diet consisting of highly palatable, energy-dense human junk foods – the “cafeteria” diet (CAF, 45-53% kcal from fat). We previously reported that CAF-fed rats became hyperphagic, gained more weight, and developed more severe hyperinsulinemia, hyperglycemia, and glucose intolerance compared to the lard-based 45% kcal from fat high fat diet–fed group. In addition, the CAF diet-fed group displayed a higher degree of inflammation in adipose and liver, mitochondrial dysfunction, and an increased concentration of lipid-derived, pro-inflammatory mediators. Building upon our previous findings, we aimed to determine mechanisms that underlie physiologic findings in the CAF diet. We investigated the effect of CAF diet-induced obesity on adipose tissue specifically using expression arrays and immunohistochemistry. Genomic evidence indicated the CAF diet induced alterations in the white adipose gene transcriptome, with notable suppression of glutathione-related genes and pathways involved in mitigating oxidative stress. Immunohistochemical analysis indicated a doubling in adipose lipid peroxidation marker 4-HNE levels compared to rats that remained lean on control standard chow diet. Our data indicates that the CAF diet drives an increase in oxidative damage in white adipose tissue that may affect tissue homeostasis. Oxidative stress drives activation of inflammatory kinases that can perturb insulin signaling leading to glucose intolerance and diabetes

PubMed Central

Carolina Digital Repository

BlackOPs: Increasing confidence in variant detection through mappability filtering

Author: Cabanski Christopher R
Hayes D Neil
Liu Jinze
Marron J S
Parker Joel S
Perou Charles M
Prins Jan F
Soloway Matthew
Wilkerson Matthew D
Publication venue: Digital Commons@Becker
Publication date: 01/01/2013
Field of study

Identifying variants using high-throughput sequen-cing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical arti-fact results from incorrectly aligning experimen-tally observed sequences to their true genomic origin (‘mismapping’) and inferring differences in mismapped sequences to be true variants. We de-veloped BlackOPs, an open-source tool that simu-lates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklist

CiteSeerX

Digital Commons@Becker

PubMed Central

University of Kentucky

Carolina Digital Repository

ABRA: improved coding indel detection via assembly-based realignment

Author: Mose Lisle E.
Neil Hayes D.
Parker Joel S.
Perou Charles M.
Wilkerson Matthew D.
Publication venue
Publication date: 01/01/2014
Field of study

Motivation: Variant detection from next-generation sequencing (NGS) data is an increasingly vital aspect of disease diagnosis, treatment and research. Commonly used NGS-variant analysis tools generally rely on accurately mapped short reads to identify somatic variants and germ-line genotypes. Existing NGS read mappers have difficulty accurately mapping short reads containing complex variation (i.e. more than a single base change), thus making identification of such variants difficult or impossible. Insertions and deletions (indels) in particular have been an area of great difficulty. Indels are frequent and can have substantial impact on function, which makes their detection all the more imperative.Results: We present ABRA, an assembly-based realigner, which uses an efficient and flexible localized de novo assembly followed by global realignment to more accurately remap reads. This results in enhanced performance for indel detection as well as improved accuracy in variant allele frequency estimation.Availability and implementation: ABRA is implemented in a combination of Java and C/C++ and is freely available for download at https://github.com/mozack/abra.Contact: [email protected]; [email protected] information: Supplementary data are available at Bioinformatics online

PubMed Central

Carolina Digital Repository

SigFuge: Single gene clustering of RNA-seq reveals differential isoform usage among cancer samples

Author: Cabanski Christopher R
Hayes D. Neil
Johnson Amy R
Kimes Patrick K
Liu Yufeng
Maher Christopher A
Makowski Liza
Marron J. S
Perou Charles M
Wilkerson Matthew D
Zhao Ni
Publication venue: Digital Commons@Becker
Publication date: 01/01/2014
Field of study

High-throughput sequencing technologies, including RNA-seq, have made it possible to move beyond gene expression analysis to study transcriptional events including alternative splicing and gene fusions. Furthermore, recent studies in cancer have suggested the importance of identifying transcriptionally altered loci as biomarkers for improved prognosis and therapy. While many statistical methods have been proposed for identifying novel transcriptional events with RNA-seq, nearly all rely on contrasting known classes of samples, such as tumor and normal. Few tools exist for the unsupervised discovery of such events without class labels. In this paper, we present SigFuge for identifying genomic loci exhibiting differential transcription patterns across many RNA-seq samples. SigFuge combines clustering with hypothesis testing to identify genes exhibiting alternative splicing, or differences in isoform expression. We apply SigFuge to RNA-seq cohorts of 177 lung and 279 head and neck squamous cell carcinoma samples from the Cancer Genome Atlas, and identify several cases of differential isoform usage including CDKN2A, a tumor suppressor gene known to be inactivated in a majority of lung squamous cell tumors. By not restricting attention to known sample stratifications, SigFuge offers a novel approach to unsupervised screening of genetic loci across RNA-seq cohorts. SigFuge is available as an R package through Bioconductor

Digital Commons@Becker

PubMed Central

Carolina Digital Repository

Recommended from our members

Integrated RNA and DNA sequencing improves mutation detection in low purity tumors

Author: Cabanski Christopher R.
Hammerman Peter S.
Hayes D. Neil
Hoadley Katherine A.
Mose Lisle E.
Parker Joel S.
Perou Charles M.
Sun Wei
Troester Melissa A.
Walter Vonn
Wilkerson Matthew D.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

Identifying somatic mutations is critical for cancer genome characterization and for prioritizing patient treatment. DNA whole exome sequencing (DNA-WES) is currently the most popular technology; however, this yields low sensitivity in low purity tumors. RNA sequencing (RNA-seq) covers the expressed exome with depth proportional to expression. We hypothesized that integrating DNA-WES and RNA-seq would enable superior mutation detection versus DNA-WES alone. We developed a first-of-its-kind method, called UNCeqR, that detects somatic mutations by integrating patient-matched RNA-seq and DNA-WES. In simulation, the integrated DNA and RNA model outperformed the DNA-WES only model. Validation by patient-matched whole genome sequencing demonstrated superior performance of the integrated model over DNA-WES only models, including a published method and published mutation profiles. Genome-wide mutational analysis of breast and lung cancer cohorts (n = 871) revealed remarkable tumor genomics properties. Low purity tumors experienced the largest gains in mutation detection by integrating RNA-seq and DNA-WES. RNA provided greater mutation signal than DNA in expressed mutations. Compared to earlier studies on this cohort, UNCeqR increased mutation rates of driver and therapeutically targeted genes (e.g. PIK3CA, ERBB2 and FGFR2). In summary, integrating RNA-seq with DNA-WES increases mutation detection performance, especially for low purity tumors

Harvard University - DASH

Digital Commons@Becker

PubMed Central

Carolina Digital Repository

SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements

Author: Chad Creighton
Charles M. Perou
Cheng Fan
Christopher R. Cabanski
D. Neil Hayes
Eric Bair
J. S. Marron
Jianying Li
Matthew D. Wilkerson
Michele C. Hayward
Xiaoying Yin
Yuan Qi
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Contemporary high dimensional biological assays, such as mRNA expression microarrays, regularly involve multiple data processing steps, such as experimental processing, computational processing, sample selection, or feature selection (i.e. gene selection), prior to deriving any biological conclusions. These steps can dramatically change the interpretation of an experiment. Evaluation of processing steps has received limited attention in the literature. It is not straightforward to evaluate different processing methods and investigators are often unsure of the best method. We present a simple statistical tool, Standardized WithIn class Sum of Squares (SWISS), that allows investigators to compare alternate data processing methods, such as different experimental methods, normalizations, or technologies, on a dataset in terms of how well they cluster a priori biological classes. SWISS uses Euclidean distance to determine which method does a better job of clustering the data elements based on a priori classifications. We apply SWISS to three different gene expression applications. The first application uses four different datasets to compare different experimental methods, normalizations, and gene sets. The second application, using data from the MicroArray Quality Control (MAQC) project, compares different microarray platforms. The third application compares different technologies: a single Agilent two-color microarray versus one lane of RNA-Seq. These applications give an indication of the variety of problems that SWISS can be helpful in solving. The SWISS analysis of one-color versus two-color microarrays provides investigators who use two-color arrays the opportunity to review their results in light of a single-channel analysis, with all of the associated benefits offered by this design. Analysis of the MACQ data shows differential intersite reproducibility by array platform. SWISS also shows that one lane of RNA-Seq clusters data by biological phenotypes as well as a single Agilent two-color microarray

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Carolina Digital Repository

BlackOPs: increasing confidence in variant detection through mappability filtering

Author: Cabanski Christopher R.
Liu Jinze
Marron J. S.
Neil Hayes D.
Parker Joel S.
Perou Charles M.
Prins Jan F.
Soloway Matthew
Wilkerson Matthew D.
Publication venue
Publication date: 01/01/2013
Field of study

Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin (‘mismapping’) and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing

Carolina Digital Repository